Happiness is increasingly recognized as a vital measure of human well-being, often offering a more holistic perspective on quality of life than traditional economic metrics alone. The World Happiness Report spanning 2019 to 2024 compiles data from countries around the globe to assess how various factors, ranging from income and social support to health, governance, and freedom, contribute to overall national happiness levels. But beyond the commonly measured economic indicators, how do social and governance factors impact a nation’s happiness? What other variables might influence how happiness is experienced and reported across different regions? One of the key columns happiness score serves as our primary measure of national well-being. Using this score as an indication of a country’s overall happiness, general research question is:
What are the most significant economic, social, and governance predictors of national happiness levels, and how can statistical modeling be used to analyze regional variations and forecast future trends using World Happiness Report data from 2019 to 2024?
To address this question, statistical techniques to examine trends, model relationships between predictors and happiness scores will be applied. My analysis is driven by the following hypotheses:
H1: Higher GDP per capita, social support, and life expectancy are significantly associated with higher happiness scores across countries.
H2: Perceived freedom and generosity positively influence happiness, but their effect size differs by region - suggesting regional interaction effects.
H3: Countries with low perceived corruption tend to report higher happiness, and this relationship strengthens when combined with strong governance indicators (e.g., social support and life expectancy).
The dataset is derived from the World Happiness Report, an annual study based on data collected by the Gallup World Poll, supplemented with official statistics from sources such as the World Bank, World Health Organization (WHO), and the United Nations. The data is collected through large-scale surveys where individuals rate their overall life satisfaction on a scale of 0 to 10. This dataset is derived from the World Happiness Report.
The dataset has been placed in the /data folder for
organization and reproducibility. A README.md file
within this folder documents the dataset’s dimensions and a codebook
explaining each variable.
Below is an overview of the dataset using glimpse() to
provide an initial summary.
## Rows: 875
## Columns: 13
## $ Year <int> 2024, 2023, 2022, 2021, 202…
## $ Rank <int> 1, 143, 137, 146, 150, 153,…
## $ Country.name <chr> "Finland", "Afghanistan", "…
## $ Ladder.score <dbl> 7.736, 1.721, 1.859, 2.404,…
## $ upperwhisker <dbl> 7.810, 1.775, 1.923, 2.469,…
## $ lowerwhisker <dbl> 7.662, 1.667, 1.795, 2.339,…
## $ Explained.by..Log.GDP.per.capita <dbl> 1.749, 0.628, 0.645, 0.758,…
## $ Explained.by..Social.support <dbl> 1.783, 0.000, 0.000, 0.000,…
## $ Explained.by..Healthy.life.expectancy <dbl> 0.824, 0.242, 0.087, 0.289,…
## $ Explained.by..Freedom.to.make.life.choices <dbl> 0.986, 0.000, 0.000, 0.000,…
## $ Explained.by..Generosity <dbl> 0.110, 0.091, 0.093, 0.089,…
## $ Explained.by..Perceptions.of.corruption <dbl> 0.502, 0.088, 0.059, 0.005,…
## $ Dystopia...residual <dbl> 1.782, 0.672, 0.976, 1.263,…
Response Variable (Y): - Happiness Score (Ladder Score) – Measures overall well-being on a scale of 0 to 10.
Explanatory (Predictor) Variables (X): - GDP per Capita - Social Support - Healthy Life Expectancy - Freedom to Choose - Perception of Corruption - Generosity
Data Cleaning and Preprocessing
dplyr::rename() function.The mean and median of happiness scores indicates that most scores are close to the middle of the scale. The standard deviation suggests moderate variability in the data. The range of happiness scores reflects a broad distribution of values. The negative skewness indicates that, while there are some countries with low happiness scores, the majority have scores closer to the higher end of the scale.
The histogram displays the distribution of happiness scores across countries. The data shows a left-skewed distribution with most countries having happiness scores between 5 and 7, which aligns with the fact that the mean and median values are relatively high. The distribution also reveals a long tail towards the lower end, indicating a smaller number of countries with significantly lower happiness scores.
The histogram of GDP per Capita shows a right-skewed distribution, with most countries clustered around a value of 1.5. This indicates that while many countries have moderate economic prosperity, a few wealthier nations contribute to the long tail on the right, highlighting global economic inequality. This distribution underscores the disparity between wealthier and less affluent nations, essential for understanding economic factors that may influence national well-being and happiness.
Each of these factors - GDP per capita, social support, and healthy life expectancy - shows a positive correlation with happiness. This indicates that economic prosperity, strong social networks, and better health contribute significantly to higher levels of national happiness.
The bar plot illustrates the median happiness scores by region from 2019 to 2024, showing a general upward trend across all regions. Europe and Oceania consistently report higher happiness scores, while Africa tends to have the lowest scores. Over time, all regions show improvement, with Americas and Asia falling in the middle range.
Exploratory data analysis will be conducted to examine distributions, detect outliers, and summarize key variables. For H1, the association between happiness and GDP per Capita, Social Support, and Life Expectancy will be assessed using scatter plots and linear regression.
H2 will be explored by incorporating perceived Freedom and Generosity into the model, with interaction terms included to examine regional differences in effect sizes.
For H3, the relationship between Corruption Perception and Happiness will be analyzed, both independently and in interaction with governance-related variables such as Social Support and Life Expectancy, to assess compound effects.
The data dictionary can be found here.